Physical limits on cooperative protein-DNA 
binding and the kinetics of combinatorial 
transcription regulation 



Nico Geisel 

Departament de Fisica Fonamental, Facultat de Fisica, 
Universitat de Barcelona, Barcelona, Spain 

Ulrich Gerland 
Arnold-Sommerfeld Center for Theoretical Physics 
and Center for Nanoscience (CeNS), 
Ludwig-Maximilians-Universitat, Miinchen, Germany 

September 16, 2011 



Abstract 



Much of the complexity observed in gene regulation originates from cooperative 
protein-DNA binding. While studies of the target search of proteins for their spe- 
cific binding sites on the DNA have revealed design principles for the quantitative 
characteristics of protein-DNA interactions, no such principles are known for the 
cooperative interactions between DNA-binding proteins. We consider a simple 
theoretical model for two interacting transcription factor (TF) species, searching 
for and binding to two adjacent target sites hidden in the genomic background. We 
study the kinetic competition of a dimer search pathway and a monomer search 
pathway, as well as the steady-state regulation function mediated by the two TFs 
over a broad range of TF-TF interaction strengths. Using a transcriptional AND- 
logic as exemplary functional context, we identify the functionally desirable regime 
for the interaction. We find that both weak and very strong TF-TF interactions are 
favorable, albeit with different characteristics. However, there is also an unfavor- 
able regime of intermediate interactions where the genetic response is prohibitively 
slow. 

Key words: cooperative protein-DNA binding, transcription regulation, target 
search, monomer vs. dimer pathway, DNA-protein complex assembly 
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Introduction 

Cells respond to many biochemical signals by adjusting their gene expression lev- 
els, often in a combinatorial way where the transcription rate of a given gene is a 
nonlinear function of several inputs. The entire signal transduction cascade, be- 
ginning with the detection of the biochemical signals and culminating in a changed 
intracellular protein concentration, is generally believed to be under strong selec- 
tive pressure for rapid and well-adjusted responses in competitive environments. 
An important step in this cascade involves proteins belonging to the large class 
of transcription factors (TFs) which convey the external signal and trigger the 
appropriate genetic response by binding to specific binding sites on the genomic 
DNA. The search process of individual TFs for their functional target sites hidden 
within millions of non-functional sites on the DNA is well characterized, see e.g. 
P-[7] . This has led to an understanding of the tradeoffs inherent in the choice of 
TF-DNA interaction parameters, when both a rapid search as well as sufficient 
equilibrium discrimination for the functional sites is required [5HTU]. 

However, the experimental timescale for the search process, as inferred e.g. 
from single-molecule measurements in vivo [H] , is surprisingly short compared to 
the timescale for significant change in gene expression levels: Whereas a TF target 
site is occupied within a minute even at low TF concentrations, the concentration 
of the protein expressed from the target gene typically changes significantly only 
over a timescale of several minutes, due to the slow kinetics of protein synthesis 
and degradation. Hence, the search time is only a fraction of the total response 
time, and it is unclear whether fine-tuning of TF-DNA interaction parameters is 
needed for kinetic reasons. On the other hand, even in bacteria many genes are 
co-regulated by a combination of different TFs [I2H2D], while the search process 
studied so far is that of a single TF species, i.e. multiple TF molecules of the same 
type. A salient question is whether the timescale of transcription control increases 
with the complexity of the implemented regulatory function. 

To explore this question, we consider a simple theoretical model for the ki- 
netics of combinatorial transcription regulation. We focus on the example of an 
AND-like cis-regulatory function implemented by two TFs, referred to as 'A' and 
'i?', which bind cooperatively to two adjacent target sites to activate a gene. 
This scenario is exemplified by the melAB promoter of E. coli, where CRP and 
MelR bind cooperatively to activate transcription [19j. Our model is sufficiently 
generic that it can be applied to a variety of cooperative protein-DNA binding 
situations. However, the example of the "AND-gate" is particularly well suited to 
illustrate the basic effects and functional tradeoffs that become apparent when the 
interaction parameters are varied. Compared to the well-studied case of a single 
TF-species, the new aspect here is the mutual interaction between the TFs (cf. 
Fig. hj), which is quantified by the dimensionless cooperativity u = e~^'"'/'^^^. 
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This quantity is only a measure of the interaction strength between TFs, with i^int 
the effective free energy of the interaction and ksT the energy scale of thermal 
fluctuations. It is not related to the Hill coefficient, which depends on the number 
of components involved in a cooperative complex. The strengths of direct protein- 
protein interactions vary over a broad range with dissociation constants between 
the femto and the centi-molar regime [21]. Biochemically feasible u values can 
therefore span many orders of magnitude, from weak transient interaction with 
1 < a; < 1000 to strong dimerization with u ~ 10^ or larger. Depending on this 
value, the kinetics of cooperative protein-DNA binding will either be dominated by 
a "monomer pathway" or a "dimer pathway" |22l |23] . How do the response time 
and the steady-state levels of a regulatory module depend on the cooperativity? 
And which regime of u values could be favorable in which functional context? 

Our model, illustrated in Fig. [2} generalizes the classic facilitated diffusion 
model [Ij to two interacting protein species. It incorporates the basic kinetic 
moves, i.e. binding to a DNA site, sliding along the DNA, and unbinding from 
the DNA, for monomers as well as for dimers. In addition, dimers can form or 
break up either in solution or while bound to the DNA. We characterize the be- 
havior of our model using a variety of analytical and numerical approaches to 
calculate equilibrium and kinetic observables over a parameter range chosen to 
permit the exploration of functional tradeoffs in a bacterial system such as E. coli. 
For instance, in bacterial transcription regulation, a faster response is generally 
expected to be advantageous, whereas the steady-state transcription levels of a cis- 
regulatory function must be adjusted to yield the optimal protein concentrations 
for the biological conditions represented by the input signals [2^] - [26] . Therefore, 
when considering different choices of u, we compare regulatory systems that lead 
to the same steady-state levels. The exploration of our model leads us to two 
favorable regimes of co, corresponding to weak (and often promiscuous) interac- 
tions and very strong heterodimerization, respectively. On the other hand, our 
model predicts that the search kinetics will be prohibitively slow at intermediate 
u values, at least when the protein copy number is small as is usually the case 
for bacterial transcription factors. In the 'Discussion' section, we consider bio- 
logical implications of these theoretical findings and discuss possible experiments 
to characterize the cooperative search problem and the kinetics of combinatorial 
transcription regulation. 
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Results 

Cooperativity and regulatory function 

Cooperative protein-DNA binding is employed in diverse functional contexts. For 
some functions, many molecules of the same protein polymerize along DNA, e.g. 
RecA for homologous recombination [27] or single-strand-binding-protein during 
DNA replication [2H]- In these cases, the role of the protein-protein interaction is 
to enhance the probability of obtaining continuous DNA coverage rather than a 
patchwork of randomly positioned molecules. Here we focus on the functional con- 
text of transcription regulation where cooperative protein-DNA binding is involved 
in the processing of input signals. These signals are integrated and transformed 
into a single output, the transcription rate of a gene [22] • 

The cooperative binding of a transcription factor (TF) with RNA polymerase 
(RNAp) transfers a signal, by regulating the effective binding threshold for RNAp 
via the concentration of active TF ('regulated recruitment' [29], see Fig.[T}\). When 
two different TFs bind cooperatively and each makes contact with RNAp to acti- 
vate transcription, see Fig. [Tp, two signals are effectively integrated into a single 
output. A similar case is depicted in Fig. [TjB where TF binding is assisted by a 
helper protein that does not make contact with RNAp itself. This motif resem- 
bles, for instance, the regulation of the melAB promoter, which is co-dependent 
on the the transcription factors CRP and MelR [19|. The helper can also be an- 
other molecule of the same TF, making the response to its signal more switch-like 
(increased effective Hill coefficient). 

The molecular function in the 'signal transfer scenario' of Fig. [TjB is quantita- 
tively described by the probability pb to find a protein B bound as a function of 
the concentration of a protein A that binds adjacently. In contrast, for the 'signal 
integration scenario', the functional activity is captured by the probability Pab that 
two DNA sites a and b are both occupied by the matching TF proteins. In the 
following, we will refer to both quantities, pb and Pab, simply as the 'average activ- 
ity' for the respective scenario. We envisage that selection acts on these average 
activities, as well as on a characteristic time scale, the 'response time' r, associated 
with the kinetics of each mechanism. Here, r corresponds to the typical delay for 
adjusting the activity to a new average level after a change in the input signal. In 
a steady state, r is also a characteristic time scale of spontaneous fiuctuations in 
the activity (noise). Importantly, both the average activity as well as the response 
time depend on the binding cooperativity u. 
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Average activity 

Before we introduce our full model, it is instructive to consider the average activity 
within the simple approximation where we focus only on two binding sites a and 
b and ignore binding of the TFs to the rest of the DNA. This consideration will 
be useful in particular as a guide for our detailed study of possible tradeoffs in the 
choice of u within the full model. 

We first consider the signal transfer scenario as shown in Fig. [Tj3. In equilib- 
rium, the probability pb that site b is occupied by one of A''^ available molecules of 
type B is the normalized sum of the statistical weights for all states where b is oc- 
cupied |30]. In the absence of A, i.e. for Na = 0, this is just ph = qh/iX '^Ih)^ with 
the statistical weight for an unoccupied site set to one and = Ns/rih denoting 
the relative weight for b to be occupied. Here, the 'binding threshold' Ub, which 
corresponds to the number of B molecules needed to obtain a 50 % average occu- 
pancy of b in the absence of A, is directly connected to the effective equilibrium 
binding constant of -B to 6 and the cell volume via = KdVcdi- In the presence 
of A, the occupancy of b increases to 

P'b = TT^ q'b = qb-[l + {uJ- l)pa] , (1) 

^ lb 

where pa = qa/{^ + qa) is the average occupancy of a in the absence of B. Thus, 
the presence of A boosts the statistical weight for B binding by the 'regulation 
factor' [30], i.e. the factor in square brackets in Eq. [l} Intuitively, this factor may 
be thought of either as a boost in the local effective concentration of B [22] , or as 
a decrease in the effective binding threshold Ub (the latter interpretation is closer 
to the underlying physics). 

Importantly, the regulation factor cannot exceed the cooperativity value w, 
and it reaches u only if pa takes on its maximal value of one. As a consequence, 
the cooperativity uj is also an upper bound on the fold-change in 6-occupancy 
induced by a change in A concentration, since Pb/Pb < q'b/qb- This constitutes a 
physical constraint on u that arises from the equilibrium statistics of cooperative 
protein-DNA binding, 

oj > (f) [equilibrium constraint], (2) 

i.e. the cooperativity must be larger than the required fold-change in the out- 
put signal (0 = p'b/Pb for the signal transfer scenario). On the molecular level, 
this constraint can be implemented by a sufficiently strong direct protein-protein 
interaction or by indirect mechanisms of cooperativity, e.g. via collaborative com- 
petition [31] or DNA bending [32] . 

For the signal integration scenario in Fig. [Tp, the definition of the fold-change 
is different, but the constraint ([2| holds as well. Here, the relevant fold-change is 
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the average activity in the presence of both inputs relative to the average activity 
with only a single input, = Pab/Pa or = Pab/Pb, where 

Pab = ^QaQb/ {l + qa + qb + ^QaQb) (3) 

This fold-change is then transferred to the promoter activity in the example con- 
sidered in Fig. [ip. Taken together, when considering steady-state activities, both 
the signal transfer and the signal integration function benefit from larger cooper- 
ativities, since large w's allow for tight regulation. However, since large binding 
energies often lead to slow kinetics, we will explore whether a tradeoff exists be- 
tween the fold-change in average activity and the response time. 

Full model 

We now introduce a full kinetic model for the cooperative target search which is 
based on the energies of TF binding states and the transition rates between these 
states, as illustrated in Fig. |2} We consider a single circular genome of length Lg (in 
units of base pairs) inside a cell of volume Vcew with a single pair of adjacent target 
sites for A and B. The unbound state of free TFs in solution is our reference state, 
with its energy set to ii^free = 0. HA and B dimerize in solution, the interaction 
energy E^^^ < is gained, while entropy is lost, since the number of possible states 
is reduced by a factor that we write as Vrp/Keii, with a microscopic volume Vtf 
on the order of the size of a TF. Each TF molecule has Lq possible binding sites 
on the DNA (indexed by i with < z < Lq) with the respective bound-state 
energies Ef and Ef . These bound-state energies are either equal to E^^ < Ef^ee, 
if the protein-DNA interaction is non-specific, or they take on a lower value if 
the binding sequence favors specific protein-DNA contacts, Ef , Ef < E^g. We 
denote by L the number of base pairs on the DNA which are occupied by a bound 
monomer (occupied DNA is inaccessible to other TF molecules), and we posit that 
A and B can form a DNA bound dimer only when B binds directly upstream of 
A. 

For the kinetic rates, we assume that all binding reactions are diffusion limited. 
For simplicity, we take the same rate constant ka for the binding of two protein 
molecules in solution and for the association of a TF molecule with a specified 
DNA site (thus, the total rate of TF binding anywhere on the DNA is Loka, if 
no DNA site is occupied already). The random diffusion of TFs along the DNA 
contour occurs with the basal sliding rate ksi. When neighboring sites have different 
energies, the sliding rate is the basal rate fcsi from the higher to the lower energy 
state while the reverse process occurs at the reduced rate fcsi exp(— A£'//cbT), 
with AE > the energy difference, such that detailed balance is respected (in 
the following we assume all energies to be in units of kBT which amounts to 
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setting ksT = 1). The rates for all other possible reactions are similarly obtained 
from detailed balance. For instance, the unbinding rate kos of a monomer from a 
non-specific DNA site is determined by kos/ka = (V'ceii/VTF)e*^~^'"=°"'"^"=\ and the 
dissociation rate kd of a free dimer k^/ka = (V^eii/VTF)e'^'"*. Note that monomers 
can also unbind or slide away from a DNA site while simultaneously dissociating 
from a cooperatively bound partner (thus disrupting the DNA-bound dimer, see 
Fig. |2|d, top right). In that case detailed balance dictates that monomer sliding 
and dissociation rates decrease by a factor 1/u due to the loss of the dimerization 
energy Ejnt. 

Within the framework of this full model, we calculate the steady-state activi- 
ties as described in Section SI in the Supporting Material (this exact calculation 
includes the effect of the genomic background and mutual exclusion of overlapping 
binding sites, both neglected in the simple discussion above). We determine aver- 
age search times numerically, using kinetic Monte Carlo simulations as described 
in Section S2, and we also develop analytical approximations further below and in 
Section S3. 

We choose the parameters of our full model to roughly reflect the situation 
in a bacterium such as E. coli. We set the genome length to Lg = 5 ■ 10^ bp, 
choose a cell volume of Vceii = 5 fim^, and consider DNA binding sites of length 
L = 15 bp. The sliding rate ksi can be determined from recent measurements of 
the one-dimensional diffusion constant for TF sliding on non-specific DNA pT| [33]. 
which obtained values close to 0.05/im^/s, corresponding to a sliding rate of about 
ks\ = 10^/s. The same experiments also determined a residence time of 0.3 — 5 ms 
for TF molecules on non-specific DNA before dissociation. At the given genome 
length, this fixes our rate constant ka to be in the range 0.4 — 6 ■ 10~^/s, and we 
set ka = 10~^/s in the following. Unless otherwise stated, we will assume, for 
simplicity, that the target sites a, b are the only specific binding sequences in the 
genome, both with energy ■ We measure all energies in units of ksT. We set 
the strength of the non-specific protein-DNA interaction by requiring that a single 
TF spends on average equal time unbound in solution as bound somewhere on 
the DNA. This parameter choice corresponds to the well-characterized optimum 
for the search process of a single TF species [U [M]; see also the discussion of 
this point further below. Within our energy model, this corresponds to a non- 
specific binding energy E^s = log (-^g ■ Vrp/Keii) = —5.3, assuming a reaction 
volume Vtf = 1 nm^. In our model, the effective dimerization rate is increased by 
the presence of the DNA (which acts as a scaffold for the interaction). A similar 
increase was observed experimentally in a study of the Jun-Fos DNA complex [23j. 
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Quantitative Analysis 

We now analyze how the quantitative characteristics of the two-protein-species 
system depend on the cooperativity u. The cooperative target state where both 
target sites are occupied can be reached via two distinct kinetic pathways: In the 
"monomer pathway", A and B separately search for their specific target sites in 
multiple rounds, alternating between one-dimensional diffusion along the DNA and 
three-dimensional diffusion in the cytoplasm to a new position on the DNA. In this 
pathway, A and B arrive independently, i.e., one after the other, at their specific 
target sites. By contrast, in the "dimer pathway", the dimer forms beforehand, 
either in solution or in the DNA background, such that A and B reach their target 
sites simultaneously (cf. Fig. [2|\). Clearly, we expect the monomer pathway to 
dominate for weak TF-TF interactions (small u), while the dimer pathway should 
dominate for large u. But what is the behavior of the overall search time r that 
results from the kinetic competition between the two pathways? 

Before performing the kinetic analysis, we first characterize the steady state 
characteristics of our full model. We will focus on the signal integration scenario 
in the remainder of this study; the behavior in the signal transfer scenario is qual- 
itatively similar. As discussed above, the most relevant steady state characteristic 
in the functional context of gene regulation is the attainable fold-change of the av- 
erage activity, which determines how tightly a gene can be regulated. We assume 
that the expression level of the regulated gene in the high-activity state, when both 
TF species can bind the promoter (the "ON-state") is constrained to its optimal 
level by evolutionary selection, e.g. the optimal level of a metabolic enzyme in the 
presence of its substrate [211 [25]. The fold-change between the ON-state and the 
OFF-state (in which only one of the TFs can bind) then determines how tightly 
the production of the protein can be suppressed under conditions when it would 
be useless or even detrimental. Hence, when we consider the system at different 
cooperativity values u, we take for granted that another system parameter is ad- 
justed to keep the ON-state level constant. Specifically, we will assume that this 
compensation occurs via the target binding threshold, which is programmable via 
the DNA sequence of the target site (TO]. In other words, we compensate a weaker 
protein-protein interaction with a stronger protein-target interaction such that the 
ON-state level pab remains constant. In E.coli and yeast, binding sites indeed tend 
to deviate from the consensus motif when multiple TFs bind next to each other 
in the cis-regulatory region [151 [El ES] . For simplicity, we consider a symmetric 
pair of TFs, which have different binding sequences, but the same energetics, such 
that qa = qb- 

Fig. [3|3 shows the resulting fold-change (f) = Pab/Pa for the full model as 
a function of the cooperativity (on a double-logarithmic scale), with the three 
curves corresponding to different ON-state levels Pab- The fold-change increases 
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monotonously with the cooperativity, roughly as ~ ^/uJ, before it saturates at a 
maximal level that depends slightly on the ON-state level. For a; ^ 1, the depen- 
dence on the ON-state level pab is non-monotonous, with a larger for pab = 0.5 
than for both pab = 0.1 and pab = 0.9. Much of this behavior can be understood 
already within the simple approximation of Eqs. [T] and |3] as follows: For large 
u, cooperative binding to the targets becomes dominant in the ON-state, such 
that the non-cooperative contribution -|- qi, in the denominator of Eq. [3] can be 
neglected. One then finds that (f) ~ y^^/pabi^ — Pab), explaining the behavior 
in the intermediate u range of Fig. i.e. the A/cJ-dependence and the non- 
monotonous dependence on the ON-state level pab- However, the saturation of the 
fold-change at very large cu is beyond this simple approximation, which neglects 
the background DNA and assumes that the TFs hetero-dimerize only on the tar- 
get. This assumption breaks down in the strongly-interacting regime, as shown in 
Fig. [SjA., which plots the equilibrium probability to find the TFs as a hetero-dimer. 
Dimers become prevalent in the background when the cooperativity outweighs the 
entropic cost of dimerization. If the non-specific DNA interaction of monomers 
is optimized for independent search (see below), the dimerization probability is 
simply Pdimer(<^) = uj / {u + 2Lq,) (see Section SI). Further increase of u has no 
significant effect on the fold-change. Hence, the full model confirms our previous 
conclusion that a large cooperativity is generally beneficial for the steady-state 
response, but only up to a value of a; ~ Lq. 

Next, we turn to the cooperative search process. We first consider the situation 
with only one molecule of each type {N^ = Nb = 1). Initially, both monomers 
are unbound. The cooperative search time r corresponds to the first point in time 
when a and h are both occupied. Fig. [3p shows its mean, (r) , as a function of for 
three different ON-state levels Pab- Here, the symbols represent simulation results, 
where the average is taken over a large number of simulation runs (see Section S2 
for details), while the solid lines represent an analytical approximation discussed 
below and in Section S3. Note that (r) is plotted in units of the monomer search 
time {tm)i which is defined as the average time needed by a monomer, e.g. of 
type A, to find its target a in the absence of B. This kinetic ratio, (r) / (tm) is a 
direct measure of the slow down of cooperative regulation relative to the timescale 
for independent regulation. When the cooperativity is negligible (a; ~ 1), Fig. |3]C 
shows that the kinetic ratio is only slightly larger than one. In this regime, the 
second protein arrives independently and on the same timescale as the first, while 
each protein is stably bound by itself, such that the first protein typically does not 
unbind from its target before the second protein arrives. Indeed, the probability 
of such a "missed encounter" depends on the ON-state level Pab and is simply 
1 ~ y/Pab when u = 1, which consistently explains the pab-dependence (at fixed 
w = 1) in Fig. |3p. 
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With increasing cooperativity u, the cooperative search time also becomes 
longer. Note that our reference time scale, the monomer search time (tm), is inde- 
pendent of u, such that the ratio plotted in Fig.|3p shows indeed the w-dependence 
of the absolute timescale for cooperative search. The slow down scales with the 
square root of the cooperativity, (r) ~ ^/uJ. This scaling reflects the mechanism 
underlying the slow down, which is produced by an increasing probability of missed 
encounters: As the cooperativity is increased, our constraint of a constant Pab im- 
plies that a monomer bound to its target becomes less stable and detaches more 
often before its partner arrives. The cooperative search time is then determined by 
the number of times a TF must return to its target before finding the other target 
occupied, which is roughly 1/pa, the inverse of the probability that a single target 
is occupied. At intermediate u, this probability scales as Pa ~ cu^^^'^, leading to 
the observed scaling. 

The increase of the search time (r) with u is not indefinite, however, because 
the relative importance of the dimer pathway increases with u. The contribution of 
the dimer pathway is shown in Fig. [3p. It displays a sigmoidal form, with a narrow 
transition region where the cooperative search switches from the monomer mode 
to the dimer mode. This transition is accompanied by a peak in the cooperative 
search time in Fig. [3p. Note that this transition occurs at significantly smaller u 
values than the transition in the equilibrium probability for hetero-dimerization 
shown in Fig. [3]A.. 

To understand the non-monotonous behavior of the cooperative search time in 
Fig. 3p, it is instructive to consider the extreme case of a purely dimeric search. 
Fig. 4] shows the purely dimeric search time (black line and circles) as a function 
of the dimer binding ratio, i.e. the relative probability Pd/Pc to find a dimer 
on the DNA versus in the cytoplasm (top x-axis). Here, the binding ratio is 
varied by changing the non-specific binding strength E^g. For comparison, the 
gray line and squares show the corresponding curve for a monomer (search time 
for a single target; monomer binding ratio on the bottom x-axis). Both curves 
display the same qualitative behavior, with the well-known optimum [H |3l] where 
the respective binding ratio equals one, i.e. the average time spent on the DNA 
matches the time spent in the cytoplasm. At larger binding ratios, the local ID 
search becomes too redundant, whereas at smaller binding ratios TFs spend too 
large a fraction of their time in solution, not searching. However, the minima of the 
two search time curves do not coincide, since dimers bind DNA more tightly than 
monomers. Consequently the protein-DNA interaction cannot be simultaneously 
optimized for monomer and dimer search. We generally assume that the protein- 
DNA interaction is optimal for monomers, since single TFs are the basic functional 
unit for transcription control in bacteria (see below for further discussion). At this 
point in Fig. |4l the purely dimeric search time is roughly a factor 10 longer than 
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the monomer search time. Returning to Fig. [3} this factor corresponds to the level 
of the plateau that is reached for very large u in Fig. |3p. 

We now consider again the intermediate u range in Fig. [3p. With increasing u 
the monomer pathway eventually becomes slower than the dimer pathway, due to 
the increasing probability of missed encounters. At the same time the dimerized 
state is increasingly stabilized. Upon dimerization of A and B in the background, 
it becomes more likely that this dimer localizes the target before it dissociates 
again into monomers. The increasing predominance of the faster dimer pathway 
explains the regime where the cooperative search time decreases with u. It also 
explains why the kinetic monomer-dimer transition in Fig. |3p occurs before the 
equilibrium monomer-dimer transition in Fig.[3]A.: even when the dimer fraction has 
not reached 50%, the dimer pathway can be kinetically dominant. At very large 
u, the monomer pathway is entirely negligible. The TFs form relatively stable 
dimers, either already in solution or when bound to non-target sites, subsequently 
search together for most of the time, and ultimately arrive at the target as a pair. 
This search mode is independent of the target binding energy and the cooperative 
search time then becomes independent of u and equal to the pure dimer search 
time plotted in Fig. |4} 

The cooperative search kinetics admits an analytical treatment, to quantita- 
tively describe the kinetic competition between the monomer pathway and the 
dimer pathway. This description takes a coarse-grained view of the problem, with 
effective transition rates between only four states, as depicted in Fig. SI. The 
initial state has both TFs A and B unbound in solution (state 2 in Fig. SI), from 
where the proteins either enter the dimer pathway by dimerizing (state 1) at rate 

or one of them independently finds its target site on the DNA (state 3) at rate 
r^. From state 1, the dimer either locates its pair of target sites at rate or 
reverts back to state 2 at the effective dissociation rate rf. Along the monomer 
pathway, from state 3, either the other TF locates its target as well (at rate r^), 
or the waiting TF leaves its target, leading back to state 2 at rate . In Section 
S3, we express the six effective rate constants in terms of the parameters of the 
full model, and then use the mean first passage time formalism to calculate the 
mean cooperative search time analytically. We have used this approach to obtain 
the curves in Fig. |3p and D, which agree well with the simulation data. 

So far we have focused on the case of a single TF molecule of each type. We now 
turn to the general case where we have A^^ molecules of type A and Nb molecules 
of type B. If we increase both molecule numbers simultaneously {Na = Nb = N), 
mass action drives the monomer-dimer equilibrium towards the dimerized state. 
Fig. S2A shows the probability for a molecule to be dimerized, Pdimer, as a function 
of u, with the different curves corresponding to different A^ values. As expected, 
the dimerization threshold of the sigmoidal curves moves to smaller cj-values as 
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is increased. Note that while we have treated the case of exactly one molecule 
for = 1, we keep the number of proteins constant only on average for > 1, 
via the chemical potential in the grand canonical ensemble (see Section SI for 
details). This choice is technically motivated, but is also biologically meaningful, 
since proteins are constantly produced and degraded in cells and their numbers 
can at best be constant on average. 

Fig. S2B displays the A^-dependence of the fold-change . In contrast to 
Fig. [3)3, the ON-state level is now kept fixed at pab = 0.5 and instead the different 
curves are for different A^ values (the fold-change is defined here with respect to 
the state where A^a = N and A'^^ = 0). For u below the dimerization threshold, 
the fold-change is independent of the molecule number A^. However, as in Fig. |3]B, 
increasing u does no longer raise the fold-change once the dimerization thresh- 
old is reached. As the dimerization threshold decreases with A^, the fold-change 
saturates at smaller u and the maximal decreases as 1/ a/JV. 

The average time required for the parallel cooperative search with A^^ = Nb = 
N molecules is shown in Fig. [S2p. As in Fig. |3} we have used the monomer 
search time as the reference time scale, but now scaled by A^~^, since the expected 
timescale for the parallel search of A^ monomers is (tm) /N. Consequently, the fact 
that all curves fall on top of each other in the regimes of weak and very strong 
interaction shows that in these regimes the cooperative search time exhibits the 
simple 1/A^ scaling, which corresponds to a linear increase of the frequency at 
which the targets are visited by monomers (in the small u regime) or by dimers 
(in the large u regime). In the intermediate regime, we find a more complex depen- 
dence on A^, indicated by the fact that the curves do not collapse. To understand 
this dependence, we extend our simplified analytical expression developed above. 
Under the conditions of interest here, the dimerization equilibrium Primer (1^) of 
Fig. S2 ^. is reached on a timescale much shorter than the cooperative search. As 
detailed in Section S3, we can then approximate the search process as a parallel 
search of A^ ■ Pdimer dimers and A^ ■ (1 — Pdimer) monomers of each kind, resulting 
in 



-Pdimer (1^) , 1 — -Pdimer (l^) 



-1 



(4) 



Here, 1/ {ta_,b{^)) is the independent search rate of the monomers, which indirectly 
depends on u through the probability of missed encounters, see Section S3, while 
the dimer search rate is 1/{tu), as in Fig. |4| We used Eq. |4] to obtain the lines 
in Fig. S2C, which display good agreement with the full simulation, showing that 
the analytical approximation yields a useful description of the cooperative search 
kinetics. 

On a more qualitative level. Fig. S2C shows how the peak in the search time 
at intermediate u values is affected by A^. The peak shifts to smaller u values 
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with larger N, and also becomes less pronounced. From Fig. S2D, which shows 
the weight of the dimer pathway in the cooperative search process according to 
Eq. S26, we see that the position of the peak remains determined by the switch 
from the monomeric to the dimeric search mode. The shifted switch to the dimeric 
search mode, which occurs at smaller u for larger N, also explains the reduction 
in the peak height: The dimeric search mode takes over before the slowdown of 
the monomeric search mode becomes dramatic. However, even with hundreds of 
TF molecules of each species, we still find a peak in the cooperative search time, 
which divides the u values into three regimes, as discussed below. 

Discussion 

We studied the kinetics and the equilibrium statistics of cooperative transcription 
factor-DNA binding to specific target sites in the genomic background. For our 
analysis, we considered the dimensionless cooperativity w as a parameter with 
a broad range of biochemically feasible values, and sought to identify functional 
tradeoffs associated with the choice of this value. We focused on the functional 
context of a signal integration scenario with AND-logic, but the results hold in a 
similar fashion for a signal transduction scenario, see Fig. [T] From this functional 
context we derived the central assumption that the average activity of the regulated 
gene has an optimal level in the ON-state, such that there is a strong selection 
pressure to maintain this level fixed regardless of the u value. We satisfied this 
constraint by compensating changes in u via the target site binding energy, which 
is "programmable" through the binding site sequence [10]. Such a compensation 
has been observed in an analysis of combinatorial promoters, i.e. binding sites 
tend to deviate from the consensus motif when multiple TFs bind next to each 
other in the cis-regulatory region [55]. It is also biologically plausible as it does 
not interfere with the regulation of genes that are only regulated by one of the 
TFs or combinatorially with other TFs. 

Given this functional setting, we determined which fold-change in the steady- 
state activity could be implemented at a given u, and how the kinetic search time 
depends on u. The fold-change quantifies the discrimination in the promoter out- 
put between the states where one or two input signals are present, while the search 
time is a lower limit to the response time of the regulatory system. The search 
process has contributions from a monomer and a dimer search pathway, the rel- 
ative weights of which we determined, again as a function of u. In the regime of 
weak protein-protein interactions, e.g. u < 10^ — 10^, we found a tradeoff between 
the kinetics and the steady-state behavior, in the sense that a higher fold-change 
is associated with a slower response due to a longer assembly time for the protein- 
complex on the target site. This tradeoff is a consequence of gene activation via 
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the monomer pathway, where individual TFs visit their targets independently and 
consecutively, possibly dissociating from the target before the cooperative part- 
ner arrives ("missed encounters"). In this regime, search time and fold-change 
both increase as ~ w^/^. At larger u, heterodimers are more stable, increasing 
the probability that the target is located simultaneously by both proteins (dimer 
pathway). At the same time, the missed encounters further slow down the inde- 
pendent monomer search, to timescales larger than the dimeric search time. Thus, 
a transition occurs where the dimer pathway gains weight and the search time 
decreases again to settle at the purely dimeric search time. 

Assumptions and limitations 

We made a number of simplifying assumptions in our coarse-grained theoretical 
model. For instance, we assumed that the DNA-binding energy of the dimer is the 
sum of the binding energies of the monomers. While dimerizing, the monomers 
may undergo conformational changes that affect the DNA-binding strength [36], 
possibly speeding up the dimeric search. In that case, the peak of the cooperative 
search time as a function of u can be even more pronounced than in our model. 
For simplicity, we assumed identical binding properties of the two TFs A and 
B, however this assumption is without loss of generality and the extension to 
asymmetric cases is straightforward. We performed the analysis reported here 
under the assumption of a non-specific background, although we have formulated 
our model and the theoretical methods to also cover the more general case of a 
heterogeneous DNA background. A brief analysis of the heterogeneous case has 
shown that the most significant effect of the heterogeneous background is to slow 
down the search time in all regimes. For our model, we have also assumed that 
the cooperativity between the TFs is mediated by a direct interaction. Indirect 
cooperativity mediated e.g. by DNA bending or looping has the same steady-state 
properties as direct cooperativity in the low u regime. However, these indirect 
mechanisms lead to different steady-state behavior at large u values and to different 
kinetics. A detailed analysis of these mechanisms is beyond the scope of this study. 

Biological ramifications and examples 

A central and robust result of our theoretical study is that one can distinguish 
three qualitatively distinct regimes of TF-TF interaction strengths for transcrip- 
tion regulation: 

(i) Weak interactions, with a cooperativity u < 10^ — 10"^, suffice to implement 
regulation functions with moderate fold-changes, on the order of 10-fold, in the 
transcription level. In this regime, the cooperative search time is only moderately 
elevated above the search time of a single TF (also on the order of 10-fold). In 
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bacteria, where the search time of a single TF molecule is around one minute 
[llj . the parallel cooperative search of 10 — 100 copies of each TF would then 
still result in fast responses on the minute timescale. The principal advantage of 
this regime from a design point of view is that TFs with weak interactions are 
flexible components, which can be used to control different genes in different ways, 
alone or cooperatively in various combinations Each TF then only needs to 
be separately optimized for monomeric search (via the non-specific protein-DNA 
interaction), while cooperative regulation by pairs of TFs is still sufficiently fast. 

(ii) Interactions of intermediate strength, with u values in the approximate 
range of w ~ 10^ — 10^, lead to cooperative search kinetics that are prohibitively 
slow, due to an excessive amount of missed encounters. Recent single-molecule 
experiments have been able to monitor the search process of a single TF in vivo 
[TT] . Our prediction of slow cooperative search kinetics could in principle be ver- 
ified using two-color fluorescence methods. Alternatively, one could measure the 
transcriptional response time of a synthetically designed, cooperatively regulated 
gene with a rapid reporter. We also expect that TF-TF interactions within this in- 
termediate regime are avoided by cells. A test of this implication of our study will 
require a large dataset quantifying a significant subset of the TF-TF interactions 
in a model organism. To our knowledge, a quantitative high-throughput assay for 
TF-TF interactions is not yet available and remains as an experimental challenge 
in the field. Instead, we discuss several specific biological examples below. 

(iii) Strong interactions, with a cooperativity u > 10^ — 10'', allow high fold- 
changes and a passable response time at the cost of losing combinatorial flexibility: 
Suppose that each TF signals a different environmental cue, and a set of genes 
needs to be activated whenever A is present, whereas another, more specialized 
group of genes is to be activated only if both signals are present. In this situation, 
a strong heterodimer would not lead to a favorable regulatory design, since the reg- 
ulation of the unconditional genes by A would be strongly affected by the presence 
of B. In other words, the strong cooperativity can lead to undesired crosstalk. 
Nevertheless, this regime of TF-TF interactions is biologically interesting: For 
instance, strong homodimers can exploit the cooperative stability mechanism to 
improve the robust function of regulatory circuits |i38j. Also, in cases where the 
combinatorial flexibility described above is not needed, strong heterodimers could 
be used to perform a very sharp and AND-like signal integration. This signal 
integration can be made very rapid by tuning the non-specific protein-DNA inter- 
action of the TFs into a weaker regime, such that the dimer DNA binding ratio 
Pd/ Pc is closer to the optimal value 1 for search on the DNA. As Fig. |4] shows, this 
would lead to a concomitant decrease of the monomer binding ratio. For TFs that 
work in this regime, we therefore expect that monomers spend less than 50 % of 
their time bound on DNA. So far, the DNA binding ratios of transcription factors 
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have not been assayed on a large scale. Such an experiment would yield interesting 
clues about the design and the mode of operation of these TFs. 

Finally, we discuss biological examples. Currently, 383 operons in E. coli are 
known to be transcriptionally regulated by two or more TFs (see Section S4). 
However, it is not known what fraction of these regulatory interactions involves 
cooperative protein-DNA binding. One well-studied case of co-dependent activa- 
tion is the melAB promoter, where CRP and MelR bind cooperatively and activate 
transcription [19]. The interaction of CRP and MelR occurs via a weak surface 
contact and the binding of either is found to be reduced if the binding of the partner 
is impeded. In the presence of both, the transcription rate is tenfold increased [19] . 
This case is a good example for our regime (i). It is interesting to note that the 
binding sites of CRP and MelR in the melAB promoter display a relatively poor 
match to the consensus sequence, which is consistent with our assumption that 
the target binding energies are evolutionarily tuned. Also, CRP is a well known 
global regulator that controls many other genes in different ways, and hence the 
combinatorial flexibility achieved with a small cooperativity u appears to be am- 
ply exploited by E. coli. Other examples of prokaryotic co-activation are the ansB 
promoter, activated by CRP and FNR [15], and the activation of the mapEP pro- 
moter by CRP and MalT [TH |39] . More generally, the regime (i) corresponds to the 
regulated recruitment mechanism for transcription regulation [29], which appears 
to be widely used in eukaryotes. Indeed, the case of the melAB promoter described 
above has been described as a bacterial version of eukaryotic enhanceosomes [19]. 
A prokaryotic example for regime (iii) may be the RcsA/RcsB heterodimer which is 
required to activate capsule expression through the RcsF phosphorylation cascade 
[10]. Interestingly, RcsB can also from homodimers and regulate the transcription 
of other genes by itself, suggesting that this TF may be optimized to always search 
and function dimer (homo- or heteromeric). 

Conclusion 

We reported a biophysical analysis of the design principles for TF-TF interac- 
tions. The exploration of our theoretical model lead us to two functionally favor- 
able regimes for the cooperativity u, corresponding to weak, glue-like promiscuous 
interactions and very strong heterodimerization, respectively. Cells appear to im- 
plement both favorable regimes, but in different biological contexts. On the other 
hand, our model predicts that the search kinetics will be prohibitively slow at 
intermediate co values, at least when the protein copy number is small as is typi- 
cally the case for transcription factors. Hence the intermediate cj-regime appears 
undesirable in this functional context. This prediction could be tested with experi- 
mental approaches from single-molecule biophysics. Currently, there is only limited 
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biochemical data available for the cooperativity values involved in transcription 
regulation, typically from in vitro experiments with selected DNA-binding pro- 
teins. Once more data becomes available, it will be interesting to see whether the 
intermediate a;-regime is indeed avoided. 
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Figure 1: Three schematic examples for cooperative protein-DNA binding in gene 
regulation. In the signal transfer scenario (A) RNAp is recruited by an activating 
TF, whereby the signal conveyed by the TF is transferred to the transcription 
level. Scenarios (B) and (C) are both examples for signal integration. In scenario 
(B), an activator is assisted by a helper protein which does not contact RNAp 
itself. In scenario (C), two different TFs bind cooperatively and contact RNAp. 
These and other motifs are used by cells to implement regulatory functions [501111], 
although the actual arrangement of TF binding sites in bacterial genomes is often 
more complicated, involving a larger number of sites |42j . 
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Figure 2: Illustration of the energy levels and the kinetic model for the two TF 
species system with a non-specific genomic background. (A) Binding of TFs to 
the DNA reduces the energy by E^^ < compared to the unbound reference state 
with energy i^free = 0. Additional energy can be gained through sequence specific 
contacts (not shown). Upon dimerization of TFs in solution or on the DNA the 
energy is further reduced by the interaction energy Ei^t < 0. The TFs bind to their 
target site with a specific binding energy Et ■ At small dimerization energies Ei^t, 
full promoter activation will be reached via the "monomer pathway", where TFs 
arrive at their target independently and consecutively. At large -Emt, on the other 
hand, TFs will pre-dimerize in the DNA-background or in solution and arrive 
to the targets simultaneously through the "dimer pathway". (B) Transcription 
factors dimerize in solution and bind to the DNA in diffusion limited binding 
reactions with a rate constant ka- The dissociation rate of a free dimer and 
the dissociation rate k^s of a TF from a DNA site depend on the corresponding 
energies and follow from detailed balance as explained in the main text. Dimers 
and monomers can randomly diffuse along the DNA with a rate ks\, which becomes 
site dependent when the binding energy is sequence specific. When the dissociation 
of a monomer requires the simultaneous dissociation from a cooperatively bound 
partner its off-rate kos decreases by a factor l/u. 
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Figure 3: Characterization of the cooperative search process and steady state lev- 
els as a function of the cooper ativity ou and for different on-state levels Pab, given 
N — 1 molecule of each TF species. (A) Dimerization probability Pdimer at equilib- 
rium. The dimerization threshold is given by the entropic cost of dimerization and 
corresponds approximately to the length of the genome Lq. (B) The fold-change 
(j) increases with the cooperativity as ^/^J below the dimerization threshold and 
then approaches a maximal value. (C) The cooperative search time (r) displays a 
maximum at an intermediate cooperativity. For large uj, the search time decreases 
again and settles at an on-state independent value, corresponding to the dimer 
search time, cf. Fig. 4. (D) The probability Wd that the cooperative target state 
is reached via the dimer pathway is distinct from Pdimer in (A), since indepen- 
dent monomeric search and dimeric search have different time scales. Note that 
the transition from the monomer to the dimer pathway marks the position of the 
maximal search time. 



Physical limits on cooperative protein-DNA binding 



24 




Figure 4: Average search times (r) of a dimer (black) and a monomer (gray) for 
the target site. The curves are obtained from (r) = a/vtLq/IO k^i ka{\/Pd/ Pc + 
>yPc/Pd), which predicts an optimum at a binding ratio of one Pd/Pc = 1, see 
Refs. in 131] • For larger binding ratios, TFs spend too much time exploring nearby 
sites with redundant one-dimensional diffusion, whereas TFs spend too much time 
unbound in solution when TF-DNA binding is weaker. Since dimers bind DNA 
more strongly than monomers, the binding ratio Pd/Pc of the dimer (indicated 
on the top x-axis) is consistently larger than that of the monomers (bottom x- 
axis). Hence dimers and monomers cannot simultaneously operate in the search 
optimum. 



Physical limits on cooperative protein-DNA binding 



25 



Supporting Material 



SI Exact calculation of steady-state activities 

Single TF molecules. We first treat tlie case wliere the cell contains only a 
single molecule of each TF species, A^^ = Nb = 1. The equilibrium statistics 
of the system is described by the canonical ensemble of statistical physics. The 
appropriate Boltzmann weight for a single TF binding to one of Lq sites in a non- 
specific DNA background is qns = exp(— ii^ng) (see below for the most general case 
with an arbitrary background and larger TF numbers). For a purely non-specific 
background and S = Keii/^F ^ Lq unbound states, the partition function is 

Zback = Lq{Lq — 2L)q^^ + S'^ 
+ 2SLGqns 

+ u{LGql + S) . (SI) 

The first three terms describe the non-interacting states, where A and B are either 
separately bound to the DNA to non-adjacent sites, or both are free but not 
dimerized, or one is DNA-bound and the other is free. The fourth term corresponds 
to the states where A and B are dimerized, either on the DNA or unbound. 
The fraction of dimers in the background corresponds to the ratio of the weights 
of the dimerized states to the weight of all possible states, u {Lcq^^ + S) /Z^ack- 
Rewriting this expression in terms of the monomer DNA binding ratio a = Pd/ Pc = 
QiisLq/S, one obtains 

-Pdimer(a, 1^) = — TTcT^^rTWoTTT TIT • (^^) 

cu + {S{a^ + 1) + 2a) /{aq^s + 1) 

For a binding ratio of one, i.e. when the monomers are optimized for independent 
search, Pdimcr(i^) = w/((X' + 2Lg), which is the case plotted in Fig. 3A. Here, a 
dimerization probability of 0.5 is reached at uJi/2 = 2Lq, while we would have 
Ui/2 = S' for a — >■ and oJi/2 = Lq for a — )■ oo. 



Eq. [ST] provides the binding-statistics on non-target states. To study the full 
system, we add the target states with weights qr = exp(— ) for the full partition 
function 

Ztot = Zback + [2(I^G - L - l)qns + S]qT + ujqx , (S3) 

where the second term is the weight of a single occupied target and the third term 
is the weight for both targets to be occupied simultaneously. Hence the double 
target occupation probability is pab = ojq^/Ztot- This equation can be interpreted 
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as a quadratic equation for qj- for given values of Pab and u (since our analysis 
assumes a fixed Pab corresponding to the optimal occupation-probability of the 
targets in the ON-state). Hence we obtain an explicit expression for Exiuj^Pab) 
(not shown), which we use throughout this paper to determine Et for the kinetic 
model and stochastic simulations in the = 1 case. Furthermore, to calculate 
the fold-change = PablVa at a given ET{uj,Pab) we determine the probability 
of single TF target binding pa in the absence of a partner. By calculating the 
partition function for a system of a single TF, we find 

^"^^'^ (i^G-l)gns + ^ + e-^-(-.^-) • ^^^^ 

For small u, this probability scales as ~ w^^^^. 



Multiple TF molecules. For the case of multiple TF molecules, we calculate 
the exact equilibrium statistics of our full model using the standard transfer matrix 
approach from statistical physics, see e.g. [I1I2]- The calculation is based on the 
grand canonical ensemble, i.e. the average copy numbers N^^, Nb of the proteins 
A and B are set by the corresponding chemical potentials ^a-, fJ'B ■ The total 
partition function Z of the complete system then factorizes, 

Z = ZdZ, , (S5) 

into a product of a "DNA partition function" Z^ involving only the DNA-bound 
states of the TFs and a "cytosol partition function" Zc involving only the unbound 
states (the factorization is possible because DNA-bound TFs do not interact with 
unbound TFs and because the TF numbers are not conserved in the grand canon- 
ical ensemble). Due to the low TF concentrations in the cytosol, steric exclusion 
between unbound TFs is negligible, and Zc takes the simple form 

Zc = (1 + e'^^ +e'^^ +we'^^+'^^)^ , (S6) 

where S ^ Lq is the number of solvent states (i.e. the ratio of the cell volume 
to a characteristic TF volume, 5* = Keii/^p) and the statistical weight for an 
unoccupied solvent state is one. For the calculation of the DNA partition function 
Zd, we do take the steric exclusion of DNA-bound TFs into account. The number 
of base pairs covered by a single TF molecule is denoted by L. Each base pair 
i = 1 . . . Lq on the genome can then be in one of 2L -|- 1 states: In state 0, the 
base pair is not covered by a TF. In state 1, it is the leftmost contact position of 
a TF of type A, in state 2 it is the second leftmost contact position, and so on, 
up to state L corresponding to the rightmost contact position of A. States L + 1 
up to 2L are analogous for B. The transfer matrix Qi describes the statistical 
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coupling between the states of the neighboring DNA positions i and i + 1. Each 
Qi is a square matrix of dimension 2L + 1, defined such that the partition function 
is equal to the trace of the (ordered) product of all transfer matrices, 



Tr 



l[Q^ 



(S7) 



for a circular DNA with Lq basepairs (for a linear DNA molecule, the trace op- 
eration would have to be replaced by multiplication of a row vector from the left 
and a column vector from the right, with the vector components properly chosen 
to enforce the boundary conditions). Let us denote by [Qi]ss' the element in row 
s and column s' of the transfer matrix at position i. It takes on a non- negative 
value, which corresponds to the conditional statistical weight of finding position 
i in state s', provided that position i — 1 is in state s. Thus, each [Qij^s' is a 
Boltzmann factor that accounts for the contribution to the total configurational 
energy that stems from position i and its interaction with position i + 1. The 
Boltzmann factor is zero, if the two states are incompatible (overlapping TPs or a 
single TF binding to non-contiguous basepairs). The non-zero entries of Qi con- 
tain the protein-DNA binding energy landscapes Ef and Ej^ , the cooperativity 
u, and the chemical potentials. For illustration, we show the transfer matrix Qi 
for TFs of length L = 2, 



Q^ 



( 1 
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V 1 









0/ 



(S^ 



The entries with value one reflect the mere compatibility of neighboring states 
without an energetic contribution (e.g., when position i — 1 is in state 1, position 
i must be in state 2, and there is no additional energy contribution to take into 
account). Note that we assume a directional interaction between the TFs A and 
B (the attractive contact only occurs when B is bound directly downstream from 
A). 



From the partition function (S5), we can obtain exact expressions for the occu- 
pation probabilities of DNA sites by differentiation. For instance, the probability 
that a TF molecule of type A is bound to the site starting at position i on the 
DNA is 

^ logZ. (S9) 



dE^ 



The derivative is straightforward to evaluate explicitly, leading to an expression 
of the form pf = Z'^/Zd, where the restricted partition function Z'^ has the same 
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form as (S7), but with a projection matrix next to Qi inside the trace. This exact 
expression is easily computed numerically, in particular when large parts of the 
binding energy landscapes and Ef are flat (equal to the non-specific binding 



energy -Ens), since large parts of the product in (S7) then reduce to matrix powers 
(which are quickly calculated via diagonalization) . Similarly, the probability of 
cooperative binding at site i is calculated starting from the expression 

where the derivatives enforce that a B molecule is bound directly adjacent to the 
A molecule, such that together they cover the DNA positions from i to i + 2L — 1. 
Finally, the average number of TF molecules in the system at given values of the 
chemical potentials /xa , /^b are obtained by summing over the occupation numbers 
of all states, e.g. 

1=1 

Similarly, the average number of dimers in the system is 

N.^.. = Y.Vr^-^ , (S12) 

j=i 

from which the fraction of dimers, Pdimer(i^) = ^dimer/A^, in Fig. 6A is computed. 
The fold-change in Fig. 6B is calculated as the ratio of the dimer occupancy 



(SIO) at the target site pair in the presence of both TFs (/i^ = /i^ = /x such that 



Na = Nb = N) to the monomer occupancy (S9) at its target site when only one 
TF is present (/i^ chosen such that A^^ = N while /i^ is set to a large negative 
value such that Nb ~ 0). 

The above framework can be used to calculate any equilibrium observable ex- 
actly for our full model and it also provides a reference point for our kinetic sim- 
ulations, which produce equilibrium values in the long-time average. However, it 
is also useful to derive a simple approximation to the exact solution of the mul- 
tiple TF molecule case, which still incorporates the effect of a (nonspecific) DNA 
background, but neglects steric exclusion between the TFs in the background. As- 
suming e'^"= <C 1 and A^ ^ Lq, and again taking a DNA binding ratio of one, such 
that S'e^°'= = -^^G, we find 
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which leads to the background dimerization fraction 



-Pdimer('^) ~ 1 " 




(S14) 



that we use in Eq. 4 of the main text for the approximative form of the cooperative 
search time. 



S2 Stochastic simulation of cooperative search 
kinetics 

To study the cooperative search process within the full reaction scheme of Fig. 2B, 
we implemented a kinetic Monte Carlo simulation based on the standard Gillespie 
algorithm. For our simulations, we used fixed numbers, Na and Nb, of A and B 
molecules (i.e., any equilibrium values computed in these simulations correspond 
to thermodynamic averages in the canonical ensemble). The state of the system is 
specified by the state of each TF molecule, which can be either free or dimerized in 
solution, or bound to the DNA at position p. The simulations generate stochastic 
continuous-time trajectories in this discrete state space. Each simulation step 
consists of one of the moves depicted in Fig. 2B, however the set of available 
moves depends on the current state of the system. In particular, moves that 
would violate the steric constraint that each DNA basepair can be be in contact 
with only a single TF molecule cannot be chosen. Thus, TF molecules can, for 
instance, not change the order at which they are bound along the DNA solely via 
sliding moves. 

To measure the average cooperative search time (r), we perform 100 simula- 
tions for each set of model parameters. Each simulation run is initialized in the 
state where all molecules are unbound (this mimics the condition of a cell prior to 
receiving a signal that triggers allosteric activation of TF-DNA binding), and ter- 
minated once the the two adjacent target sites are both occupied simultaneously. 
The data points in Fig. 3C, Fig. 4, and Fig. 6C correspond to the simulation time 
averaged over the 100 runs. Another observable of interest here is the relative 
contribution of the dimer pathway to the search process, as shown in Fig. 3D 
and Fig. 6D. This observable corresponds to the fraction of simulation runs where 
the final state is reached by a dimer move, such that both targets simultaneously 
become occupied by their cognate TF molecule. 
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S3 Analytical description of the cooperative search 
kinetics 

Here, we develop a simplified analytical description of the cooperative search ki- 
netics, which distinguishes only the target occupation states and the two search 
modes (dimcric vs. monomeric). As shown in Fig. SI, this description corresponds 
to a kinetic scheme with four states and six effective rates. The scheme amounts 
to two competing Michaelis-Menten type processes which lead to the same final 
state. The initial state 2 corresponds to the state of our TF-DNA system where 
both proteins are unbound. From there, the target state can either be reached via 
state 1 (dimer pathway) or via state 3 (monomer pathway). The dimer pathway 
is kinetically characterized by the effective dimerization rate r^, the effective dis- 
sociation rate rf, and the dimer search rate rf = 1/{td)- Similarly, the monomer 
pathway is characterized by the three rates r^, r^, and r^. Since state 3 does not 
distinguish whether A or B is bound, the rate r2 = '^/{tm) is twice the monomer 
search rate. In contrast, the rate = 1/2{tm) corresponds to only half the search 
rate of a monomer because one target is already occupied and the other target is 
accessible from one side only. Finally, is the total rate at which a monomer 
dissociates from its target, either via sliding or unbinding. 

We can express the three remaining undetermined rate constants r^, ri, and 
in terms of our underlying model parameters. For arbitrary binding energy 
landscapes, the effective dimerization rate is 

^2 = E + Pfpf+L+l + kapfP^ + KpfP^] + KP^P^ , (S15) 

where we have used the equilibrium probabilities introduced above in section A 
of 'Methods', and P/, denote the equilibrium probabilities for the TFs to be 
unbound in solution. The rates kf'^ and kf'~ denote the forward and backward 
sliding rates from position i, see section 'Full model'. Using our approximations 
from section A for a non-specific background, we find the simpler form for the 
effective dimerization rate 

= - ^a) Pi + K , (S16) 

where P^, = I — P^ = 1 — Pc is the probabihty to find a TF molecule bound to 
DNA. Similarly, the effective dissociation rate has the general form 

AB 

= E ^ (^^°' + C + kt + kSl) + k, P^^ , (S17) 
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where kf'"^ denotes the site-specific DNA-unbinding rate for A and P^^ is the 
probabihty to find the two TFs dimerized in solution. The simphfied effective 
dissociation rate for a non-specific background is 

rt = {ksi + kos) + P^"" , (S18) 

where P^^ is the total probability to find the TFs non-specifically bound to the 
DNA as a heterodimer. Finally, the total rate for monomer loss from a target is 

= koS,a + 2 fcsl,a • (S19) 

where the index a indicates that these are unbinding and sliding rates from the tar- 
get site, which are slower than their bulk counterparts by the additional Boltzmann 
factor corresponding to the energy difference between the non-specific binding en- 
ergy and the target binding energy, see section 'Full model'. 

With these rates, the average assembly time of the two TFs on the double target 
corresponds to the mean first passage time (MFPT) of a random walker hopping 
between the four sites at the given site-dependent jump rates. The random walker 
starts at site 2 and terminates on the target site. We use the standard MFPT 
formalism as described, for instance, in Ref. [3] to calculate this cooperative search 
time. The general formula for the MFPT (r(M)) starting from site M on a linear 
lattice with + 1 sites, with the two boundary sites and both absorbing, is 

A''—! m _ m — M—1 m _ m — 

(r(M)) = wm EE^n^-EE^^nJt. (S20) 

m=l n=l " j=n+l 1 m=l n=l ^ j=n+l J 

where W{M) is the total probability to exit to site A^, 

M-l m - 

^(M) = . (S21) 

m=l j=l J 

For the problem at hand, we have A^ = 4 and M = 2. Defining the Michaelis- 
Menten-type constant Ki = (rf + r^)/r2 for state 1 and A's = (rj^ + t^)/t2 for 
state 3, we can rewrite the cooperative search rate, i.e. the inverse average search 
time, in the compact form 
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which is the expression used to obtain the hnes in Fig. 3C. In the hmit where r2 
vanishes, this reduces to the average search rate for two independent monomers, 



{ta,b) 1 + K3 



(S23) 



Using the relation 2r^pa = tJ(1 —Pa), we can rewrite the corresponding search 
time in the form 

{tab) = + ^) (tm) , (S24) 

which best explains the effect of missed encounters where 1/pa is the average 
number of times a TF must return to the target before finding the other target 
occupied. In the small u regime the cooperative search process corresponds to an 
independent monomer search and (r) ^ {'Ta,b)- Given that Pa ~ w"^/^, this form 
also explains the (r) ~ ^/u} scaling of the search time at small cooperativities. 
We can further simplify Eq. |S22 by noting that the average search time is vir- 



tually identical (in the parameter regime considered here) when the search begins 
in state 1 instead of state 2. With state 1 as the initial state, we find 



5 

-1 



(r) = ( rrPdimer + ^^(1 ' ^dimcr) ) • (S25) 

The first term corresponds to the dimer pathway, while the second term corre- 
sponds to the monomer pathway. As expected, the contribution of either pathway 
depends on the dimerization probability and on the search rate of the respective 
mode. It follows that the relative weight of the dimer pathway can be written as 

"dimer (CU) + (1 - -Pdimer (w) ) (t^.s) ^ 

which was used to obtain the lines in Fig. 3D. It is straightforward to generalize 
these equations also to the case of > 1, where the dimerization probability 
Pdimeri^, N) becomes a function of both u and A^, and the search rate for each 
mode increases by a factor of A^: rf — > Nr^ and {ta,b) {ta,b)/N. In this case 
we obtain Eq. 4 from the main text which is used to obtain the analytical curves in 



Fig. S2C. Using the dimerization probability Pdimer(w, A^), we also extend Eq. S26 
to the case of A^ > 1, to obtain the curves in Fig. S2D. 



S4 Additional notes 



To obtain an estimate of the number of E. coli operons which are regulated by 
two or more transcription factors, we perused the "RegulonDB" database [4]. At 
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the time of writing, this database hsts 370 E. coli operons as regulated by a single 
transcription factor, while 383 operons are listed as regulated by two or more 
transcription factors (188 of these are believed to be regulated by exactly two 
transcription factors) . 
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Figure SI: Simplified model used to calculate the mean cooperative search time 
analytically. In this model only the different target occupation states and the 
dimeric vs. monomeric search modi are distinguished. The rates rf and corre- 
spond to the search rates of dimers or monomers respectively, whereas and rf 
are the total rates at which a dimerization or a dissociation occur in the dimeric 
or monomeric state, respectively. The rate refers to the total rate at which a 
monomer leaves its target, either by sliding away or by dissociating from it. 
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Figure S2: Cooperative search times and steady state levels as a function of u, 
given different TF copy numers N — 1, 10, 100, and 1000 at a fixed Pab — 0.5. (A) 
The dimerization threshold decreases with increasing TF concentrations whereas 
the foldchange (B) is independent of the TF number in the monomeric regime. 
The maximal foldchange is reached at the dimerization threshold, which decreases 
with the TF concentration, such that the maximal foldchange in (B) decreases 
as well. The search time (C) scales as 1/N in the purely monomeric and purely 
dimeric regime. In the intermediate regime, the maximal search time decreases 
stronger than as the onset of the dimeric pathway (shown in D) moves to 

lower cooperativities. 



